AltSports Surf Location Analysis

Author

Owen Loughery

Make sure to use Table of Contents to help navigate report!

Libraries used

Code
library(jsonlite)
library(tidyverse)
library(knitr)
library(rlang)
library(ggtext)

Importing and cleaning data

Importing Data:

Code
trestles <- fromJSON("C:/Users/rdlou/class/Altsports/4784.json")

surf <- read.csv("C:/Users/rdlou/Downloads/wsl_api_combined.csv/wsl_api_combined.csv")

stances <- read.csv("C:/Users/rdlou/Downloads/wsl_api_surfers_combined.csv")

Adding Stances to dataset:

Code
stances <- stances |>
  select(athlete_id = athleteId,
         stance,
         nationAbbr) |>
  distinct(athlete_id, .keep_all = TRUE)

surf <- left_join(surf, stances, by = "athlete_id")

Cleaning up the surf data set to only use completed heats and also ct/cs events:

Code
ct_cs <- surf |>
  filter(heat_status == 'completed',
         tourId == 1 | tourId == 12 | tourId == 2)

ct_cs$locationName <- as.factor(ct_cs$locationName)

Classified the locations based on the type of surf break they are.

Point Break: As name implies these are primarily long shoulder breaks that are over reef or cobble and are typically surfed maneuver based. Typically not super heavy and instead are shoulders that go on for a long time.

Reef Break: These are heavy reef breaks which are typically surfed as barrel locations but sometimes might consist of maneuver surfing based on the conditions. Typically should be heavier waves with very critical sections ontop of reef.

Beach Break: These are sand bars where the waves primarily depend on the location but are usually very fast waves where breaks can be surfed as barrels or maneuvers but these spots usually see a mix of both based on how its breaking. Usually more forgiving and waves are way more inconsistent than reef and point type breaks.

Wave Pool: Brings out how much technical skill a surfer has as every wave is the exact same for each surfer so really only matters at how good you can surf it by eliminating all other ocean factors.

Code
ct_cs <- ct_cs |>
  mutate(breakType = case_when(
    locationName %in% c("Gold Coast", "Jeffreys Bay", "Barra de la Cruz",
                        "Ribeira D'Ilhas", "Punta Roca", "Merewether Beach", "Lower Trestles", "Bells Beach") ~ "Point Break",
    locationName %in% c("Uluwatu", "Keramas", "Teahupoʻo", "Banzai Pipeline", "Pipeline",
                        "Margaret River", "Ali'i Beach", "Rottnest Island", "Sunset Beach",
                        "G-Land", "Cloudbreak") ~ "Reef Break",
    locationName %in% c("Saquarema", "Capbreton / Hossegor / Seignosse", "Supertubos",
                        "Manly Beach", "Huntington Beach", "Narrabeen", "Peniche Centre Region",
                        "Ballito", "Itauna", "North Narrabeen", "Newcastle") ~ "Beach Break",
    locationName %in% c("Lemoore", "Surf Abu Dhabi") ~ "Wavepool",
    TRUE ~ "other"
  )) |>
  group_by(eventId) |>
  mutate(
    score_std = (score - mean(score, na.rm = TRUE)) / sd(score, na.rm = TRUE)
  ) |>
  ungroup()

ct_cs <- ct_cs |>
  mutate(locationName = if_else(locationName == "Banzai Pipeline", "Pipeline", locationName))

ct <- ct_cs |>
  filter(tourId == 1)

head(ct_cs) |>
  kable()
athlete_name athlete_id round_number heat_number score heat_status eventId eventYear eventName eventStartDate eventEndDate tourId tourName tourNameWO gender locationId locationName stance nationAbbr breakType score_std
Frederico Morais 1343 2 1 12.16 completed 2647 2018 Quiksilver Pro Gold Coast 2018-03-11 2018-03-22 1 Men’s Championship Tour Championship Tour M 4 Gold Coast Regular POR Point Break 0.0071362
Ezekiel Lau 1957 2 1 9.90 completed 2647 2018 Quiksilver Pro Gold Coast 2018-03-11 2018-03-22 1 Men’s Championship Tour Championship Tour M 4 Gold Coast Regular HAW Point Break -0.7832281
Conner Coffin 1215 2 2 12.20 completed 2647 2018 Quiksilver Pro Gold Coast 2018-03-11 2018-03-22 1 Men’s Championship Tour Championship Tour M 4 Gold Coast Regular USA Point Break 0.0211249
Yago Dora 3994 2 2 10.60 completed 2647 2018 Quiksilver Pro Gold Coast 2018-03-11 2018-03-22 1 Men’s Championship Tour Championship Tour M 4 Gold Coast Goofy BRA Point Break -0.5384250
Michael Rodrigues 2251 2 3 14.67 completed 2647 2018 Quiksilver Pro Gold Coast 2018-03-11 2018-03-22 1 Men’s Championship Tour Championship Tour M 4 Gold Coast Regular BRA Point Break 0.8849300
Sebastian Zietz 14 2 3 10.80 completed 2647 2018 Quiksilver Pro Gold Coast 2018-03-11 2018-03-22 1 Men’s Championship Tour Championship Tour M 4 Gold Coast Regular HAW Point Break -0.4684813

Merging odds into data set (Had chat gpt help do this as would have been extremely tedious otherwise since was not formatted well to merge)

Code
library(tidyverse)

# ---- 1) Load the standardized odds table from CSV ----
odds_std <- read_csv("C:/Users/rdlou/Downloads/surfer_odds_standardized.csv", show_col_types = FALSE) %>%
  mutate(
    athlete_name = str_squish(as.character(athlete_name)),
    locationName = str_squish(as.character(locationName))
  )

# ---- 2) Optional: Duplicate Rio Pro odds for both "Saquarema" and "Itauna" ----
odds_std <- bind_rows(
  odds_std,
  odds_std %>% filter(locationName == "Saquarema") %>% mutate(locationName = "Itauna")
) %>%
  distinct()


# ---- 3) Helper function to merge with your results ----
merge_odds <- function(results_df, odds_tbl = odds_std, join = c("left","inner","right","full")) {
  join <- match.arg(join)
  
  results_norm <- results_df %>%
    mutate(
      athlete_name = str_squish(as.character(athlete_name)),
      locationName = str_squish(as.character(locationName))
    )
  
  jfun <- switch(
    join,
    left  = left_join,
    inner = inner_join,
    right = right_join,
    full  = full_join
  )
  
  jfun(
    results_norm,
    odds_tbl %>% select(athlete_name, locationName, event, eventStartDate, odds, odds_format),
    by = c("athlete_name","locationName")
  )
}

# ---- Example usage ----
# Assuming your CT results are in a dataframe called `ct`
ct <- merge_odds(ct, join = "left")
# View unmatched cases:
# ct %>% anti_join(odds_std, by = c("athlete_name","locationName")) %>% count(locationName)

ct <- ct |>
  select(-event, -eventStartDate.y) |>
  rename(eventStartDate = eventStartDate.x)

ct <- ct |>
  mutate(
    odds = case_when(
      odds_format == "EU" & odds >= 2 ~ (odds - 1) * 100,
      odds_format == "EU" & odds < 2  ~ -100 / (odds - 1),
      TRUE ~ odds
    ),
    odds_format = if_else(odds_format == "EU", "US", odds_format)
  )

Making a dataset that only includes surfers that have competed in a location at least twice:

Code
surfer_location_counts <- ct |>
  distinct(athlete_name, locationName, eventId) |>
  group_by(athlete_name, locationName) |>
  summarise(n_events = n(), .groups = "drop") |>
  filter(n_events >= 2)

ct_2loc <- ct |>
  inner_join(surfer_location_counts, by = c("athlete_name", "locationName"))

Work process of getting to current graphs

Here I am looking at what the standardized scores are like for each location while categorizing them by the break types.

Here’s graphs for some of the tops surfers

(Also I decided to only look at the graphs of championship tour events because ones where the location was only at a CS spot the surfers always had a high standardized score regardless of the break type or location simply because they were better than the competition)

Code
italo <- ct_2loc |> 
  filter(athlete_name == 'Italo Ferreira',
         tourId == 1)


italo |>
  ggplot(aes(x = reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +
  geom_boxplot() +
  labs(
    title = "Distribution of Standardized Scores by \n Location and Break Type",
    subtitle = "Italo Ferreira",
    x = "Location",
    y = "score_std",
    fill = "Break Type"
  ) +
  theme_minimal() +
  coord_flip()

Code
Yago_Dora <- ct_2loc |> 
  filter(athlete_name == 'Yago Dora',
         tourId == 1)


Yago_Dora |>
  ggplot(aes(x = reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +
  geom_boxplot() +
  labs(
    title = "Distribution of Standardized Scores by \n Location and Break Type",
    subtitle = "Yago Dora",
    x = "Location",
    y = "score_std",
    fill = "Break Type"
  ) +
  theme_minimal() +
  coord_flip()

Code
Ethan_Ewing <- ct_2loc |> 
  filter(athlete_name == 'Ethan Ewing',
         tourId == 1)


Ethan_Ewing |>
  ggplot(aes(x = reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +
  geom_boxplot() +
  labs(
    title = "Distribution of Standardized Scores by \n Location and Break Type",
    subtitle = "Ethan Ewing",
    x = "Location",
    y = "score_std",
    fill = "Break Type"
  ) +
  theme_minimal() +
  coord_flip()

Code
Griffin_Colapinto <- ct_2loc |> 
  filter(athlete_name == 'Griffin Colapinto',
         tourId == 1)


Griffin_Colapinto |>
  ggplot(aes(x = reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +
  geom_boxplot() +
  labs(
    title = "Distribution of Standardized Scores by \n Location and Break Type",
    subtitle = "Griffin Colapinto",
    x = "Location",
    y = "score_std",
    fill = "Break Type"
  ) +
  theme_minimal() +
  coord_flip()

Decided to explore lower performing CT surfers. I was curious if the effect of break type has more influence on these types of surfers as they probably did really well in some events but not others. Unlike the top surfers on the tour who probably perform higher regardless of the type of break.

Code
Jordy_Smith <- ct_2loc |> 
  filter(athlete_name == 'Jordy Smith',
         tourId == 1)


Jordy_Smith |>
  ggplot(aes(x = reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +
  geom_boxplot() +
  labs(
    title = "Distribution of Standardized Scores by \n Location and Break Type",
    subtitle = "Jordy Smith",
    x = "Location",
    y = "score_std",
    fill = "Break Type"
  ) +
  theme_minimal() +
  coord_flip()

Code
Kanoa_Igarashi <- ct_2loc |> 
  filter(athlete_name == 'Kanoa Igarashi',
         tourId == 1)


Kanoa_Igarashi |>
  ggplot(aes(x = reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +
  geom_boxplot() +
  labs(
    title = "Distribution of Standardized Scores by \n Location and Break Type",
    subtitle = "Kanoa Igarashi",
    x = "Location",
    y = "score_std",
    fill = "Break Type"
  ) +
  theme_minimal() +
  coord_flip()

Code
Miguel_Pupo <- ct_2loc |> 
  filter(athlete_name == 'Miguel Pupo',
         tourId == 1)


Miguel_Pupo |>
  ggplot(aes(x = reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +
  geom_boxplot() +
  labs(
    title = "Distribution of Standardized Scores by \n Location and Break Type",
    subtitle = "Miguel Pupo",
    x = "Location",
    y = "score_std",
    fill = "Break Type"
  ) +
  theme_minimal() +
  coord_flip()

Code
Seth_Moniz <- ct_2loc |> 
  filter(athlete_name == 'Seth Moniz',
         tourId == 1)


Seth_Moniz |>
  ggplot(aes(x = reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +
  geom_boxplot() +
  labs(
    title = "Distribution of Standardized Scores \n by Location and Break Type",
    subtitle = "Seth Moniz",
    x = "Location",
    y = "score_std",
    fill = "Break Type"
  ) +
  theme_minimal() +
  coord_flip()

Code
Joao_Chianca <- ct_cs |> 
  filter(athlete_name == 'Joao Chianca',
         tourId == 1)


Joao_Chianca |>
  ggplot(aes(x = reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +
  geom_boxplot() +
  labs(
    title = "Distribution of Standardized Scores \n by Location and Break Type",
    subtitle = "Joao Chianca",
    x = "Location",
    y = "score_std",
    fill = "Break Type"
  ) +
  theme_minimal() +
  coord_flip()

Code
Leonardo_Fioravanti <- ct_cs |> 
  filter(athlete_name == 'Leonardo Fioravanti',
         tourId == 1)


Leonardo_Fioravanti |>
  ggplot(aes(x = reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +
  geom_boxplot() +
  labs(
    title = "Distribution of Standardized Scores \n by Location and Break Type",
    subtitle = "Leonardo Fioravanti",
    x = "Location",
    y = "score_std",
    fill = "Break Type"
  ) +
  theme_minimal() +
  coord_flip()

Code
Filipe_Toledo <- ct_cs |> 
  filter(athlete_name == 'Filipe Toledo',
         tourId == 1)


Filipe_Toledo |>
  ggplot(aes(x = reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +
  geom_boxplot() +
  labs(
    title = "Distribution of Standardized Scores \n by Location and Break Type",
    subtitle = "Filipe Toledo",
    x = "Location",
    y = "score_std",
    fill = "Break Type"
  ) +
  theme_minimal() +
  coord_flip()

Code
Jack_Robinson <- ct_cs |> 
  filter(athlete_name == 'Jack Robinson',
         tourId == 1)


Jack_Robinson |>
  ggplot(aes(x = reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +
  geom_boxplot() +
  labs(
    title = "Distribution of Standardized Scores \n by Location and Break Type",
    subtitle = "Jack Robinson",
    x = "Location",
    y = "score_std",
    fill = "Break Type"
  ) +
  theme_minimal() +
  coord_flip()

Looked into Challenger Series but weren’t enough competitions and data to get results for effect of break type / location

Code
Jake_Marshall <- ct_cs |> 
  filter(athlete_name == 'Jake Marshall',
         tourId == 12)


Jake_Marshall |>
  ggplot(aes(x = reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +
  geom_boxplot() +
  labs(
    title = "Distribution of Standardized Scores \n by Location and Break Type for CS",
    subtitle = "Jake Marshall",
    x = "Location",
    y = "score_std",
    fill = "Break Type"
  ) +
  theme_minimal() +
  coord_flip()

Code
Crosby_Colapinto <- ct_cs |> 
  filter(athlete_name == 'Crosby Colapinto',
         tourId == 12)


Crosby_Colapinto |>
  ggplot(aes(x = reorder(locationName, score_std, FUN = median), y = score_std, fill = breakType)) +
  geom_boxplot() +
  labs(
    title = "Distribution of Standardized Scores \n by Location and Break Type for CS",
    subtitle = "Crosby Colapinto",
    x = "Location",
    y = "score_std",
    fill = "Break Type"
  ) +
  theme_minimal() +
  coord_flip()

Looking into which surfers have competed at least 5 cs events and 3 ct events

Code
ct_cs_years <- ct_cs %>%
  distinct(athlete_name, tourId, eventYear)

surfer_counts <- ct_cs_years %>%
  group_by(athlete_name, tourId) %>%
  summarise(n_years = n(), .groups = "drop")

surfer_wide <- surfer_counts %>%
  tidyr::pivot_wider(names_from = tourId, values_from = n_years, names_prefix = "tour_")

dual_tour_surfer <- surfer_wide %>%
  filter(tour_1 >= 3, tour_12 >= 5)

dual_tour_surfer |>
  select(athlete_name) |>
  kable()
athlete_name
Callum Robson
Cole Houshmand
Crosby Colapinto
Deivid Silva
Eli Hanneman
Frederico Morais
Ian Gentil
Imaikalani deVault
Jackson Baker
Jacob Willcox
Jadson Andre
Joan Duru
Joao Chianca
Jorgann Couzinet
Kauli Vaast
Kolohe Andino
Mateus Herdy
Matthew McGillivray
Michael Rodrigues
Morgan Cibilic
Nat Young
Samuel Pupo

Here I just did some testing out of curiosity

Code
library(dplyr)
library(purrr)
library(broom)

surfer_pvals <- ct |>
  filter(!is.na(score_std), !is.na(breakType)) |>
  group_by(athlete_name) |>
  filter(n_distinct(breakType) > 1) |>
  nest() |>
  mutate(
    model = map(data, ~ lm(score_std ~ breakType, data = .x)),
    anova = map(model, ~ anova(.x))
  ) |>
  mutate(
    breakType_pval = map_dbl(anova, ~ .x$`Pr(>F)`[1])
  )

Testing using ANOVA to see if standardized scores vary signif. based on break type for each individual surfer.

Code
surfer_pvals |>
  filter(breakType_pval <= .1) |>
  select(athlete_name, breakType_pval)
# A tibble: 13 × 2
# Groups:   athlete_name [13]
   athlete_name      breakType_pval
   <chr>                      <dbl>
 1 Yago Dora               0.0270  
 2 Kanoa Igarashi          0.0465  
 3 Jesse Mendes            0.0232  
 4 Caio Ibelli             0.0828  
 5 Filipe Toledo           0.00115 
 6 Michel Bourez           0.00627 
 7 Griffin Colapinto       0.0909  
 8 Jeremy Flores           0.0981  
 9 Jack Robinson           0.0223  
10 Alejo Muniz             0.00505 
11 Wiggolly Dantas         0.00572 
12 Barron Mamiya           0.000204
13 Kauli Vaast             0.00628 

13/87 of the surfers have significant p-values so their scores appear to be heavily influenced by the break type

Stances correlation

Code
ct |>
  filter(!is.na(stance)) |>
  ggplot(aes(x = reorder(locationName, score_std, FUN = median), y = score_std, fill = stance)) +
  geom_boxplot(position = position_dodge(), outlier.shape = NA) +
  labs(
    title = "Standardized Score by Location and Stance",
    x = "Location",
    y = "Standardized Score (score_std)",
    fill = "Stance"
  ) +
  theme_minimal() +
  coord_flip()

T-test showing effect of a surfers stance on score_std for each location. (underneath are just testing to see if normality and equal variance assumptions are met)

Code
stance_tests <- ct |>
  filter(!is.na(score_std), !is.na(stance)) |>
  group_by(locationName) |>
  filter(n_distinct(stance) == 2) |>
  summarise(
    p_value = tryCatch(t.test(score_std ~ stance)$p.value, error = function(e) NA),
    .groups = 'drop'
  )

stance_tests |>
  kable()
locationName p_value
Barra de la Cruz 0.2615045
Bells Beach 0.0701139
Capbreton / Hossegor / Seignosse 0.8207040
Cloudbreak 0.4091162
G-Land 0.1927151
Gold Coast 0.1376296
Itauna 0.1469320
Jeffreys Bay 0.6607112
Keramas 0.1768840
Lemoore 0.3127732
Lower Trestles 0.7021434
Merewether Beach 0.2203692
Narrabeen 0.1815822
Peniche Centre Region 0.6648567
Pipeline 0.8302563
Punta Roca 0.6199450
Rottnest Island 0.0067027
Sunset Beach 0.0326178
Supertubos 0.0011528
Surf Abu Dhabi 0.2783283
Teahupoʻo 0.0000022
Uluwatu 0.8403997
Code
ggplot(ct, aes(x = score_std)) +
  geom_histogram(bins = 30) +
  facet_wrap(~ locationName)

Code
library(car)
leveneTest(score_std ~ locationName, data = ct) |>
  kable()
Df F value Pr(>F)
group 23 0.4135738 0.9938537
4438 NA NA

T-test to see if there is a significant difference in standardized scores between goofy and regular foot for each type of break.

Code
surf_clean <- ct |>
  filter(!is.na(score_std), !is.na(stance), !is.na(breakType),
         stance == "Goofy" | stance == "Regular")

results <- surf_clean |>
  group_by(breakType) |>
  do(tidy(t.test(score_std ~ stance, data = .)))

results |>
  select(breakType, estimate, estimate1, estimate2, p.value, conf.low, conf.high) |>
  kable()
breakType estimate estimate1 estimate2 p.value conf.low conf.high
Beach Break 0.2093518 0.1470319 -0.0623199 0.0023001 0.0750745 0.3436291
Point Break 0.0246317 0.0178890 -0.0067427 0.6888973 -0.0961281 0.1453915
Reef Break 0.0430985 0.0310203 -0.0120782 0.3838134 -0.0539664 0.1401634
Wavepool 0.1857202 0.1214324 -0.0642878 0.1613338 -0.0748940 0.4463344
Code
ggplot(ct, aes(x = score_std)) +
  geom_histogram(bins = 30) +
  facet_wrap(~ breakType)

Code
leveneTest(score_std ~ breakType, data = ct) |>
  kable()
Df F value Pr(>F)
group 3 0.3352487 0.7998617
4458 NA NA

It does not appear that stance has much of an effect on how surfers perform besides at a very select few locations like Teahupoo. So I decided to not look further into this.

Same graph as earlier but overlayed surfers odds for each location where applicable to further the depth into what is shown

Code
italo <- ct_2loc |>
  filter(athlete_name == "Italo Ferreira", tourId == 1)


odds_by_loc <- italo |>
  distinct(locationName, odds) |>
  group_by(locationName) |>
  slice(1) |>
  ungroup()

italo_lab <- italo |>
  left_join(odds_by_loc, by = "locationName", suffix = c("", "_loc")) |>
  mutate(locationLabel = ifelse(is.na(odds_loc),
                                locationName,
                                paste0(locationName, " (", odds_loc, ")")))

italo_lab |>
  ggplot(aes(x = reorder(locationLabel, score_std, FUN = median),
             y = score_std, fill = breakType)) +
  geom_boxplot() +
  labs(
    title = "Distribution of Standardized Scores by \n Location and Break Type",
    subtitle = "Italo Ferreira — odds shown in parentheses",
    x = "Location (2025 odds)",
    y = "score_std",
    fill = "Break Type"
  ) +
  theme_minimal() +
  coord_flip()

Code
Jack_Robinson <- ct_2loc |>
  filter(athlete_name == "Jack Robinson", tourId == 1)


odds_by_loc <- Jack_Robinson |>
  distinct(locationName, odds) |>
  group_by(locationName) |>
  slice(1) |>
  ungroup()

Jack_Robinson_lab <- Jack_Robinson |>
  left_join(odds_by_loc, by = "locationName", suffix = c("", "_loc")) |>
  mutate(locationLabel = ifelse(is.na(odds_loc),
                                locationName,
                                paste0(locationName, " (", odds_loc, ")")))

Jack_Robinson_lab |>
  ggplot(aes(x = reorder(locationLabel, score_std, FUN = median),
             y = score_std, fill = breakType)) +
  geom_boxplot() +
  labs(
    title = "Distribution of Standardized Scores by \n Location and Break Type",
    subtitle = "Jack Robinson — odds shown in parentheses",
    x = "Location (2025 odds)",
    y = "score_std",
    fill = "Break Type"
  ) +
  theme_minimal() +
  coord_flip()

Code
Filipe_Toledo <- ct_2loc |>
  filter(athlete_name == "Filipe Toledo", tourId == 1)


odds_by_loc <- Filipe_Toledo |>
  distinct(locationName, odds) |>
  group_by(locationName) |>
  slice(1) |>
  ungroup()

Filipe_Toledo_lab <- Filipe_Toledo |>
  left_join(odds_by_loc, by = "locationName", suffix = c("", "_loc")) |>
  mutate(locationLabel = ifelse(is.na(odds_loc),
                                locationName,
                                paste0(locationName, " (", odds_loc, ")")))

Filipe_Toledo_lab |>
  ggplot(aes(x = reorder(locationLabel, score_std, FUN = median),
             y = score_std, fill = breakType)) +
  geom_boxplot() +
  labs(
    title = "Distribution of Standardized Scores by \n Location and Break Type",
    subtitle = "Filipe Toledo — odds shown in parentheses",
    x = "Location (2025 odds)",
    y = "score_std",
    fill = "Break Type"
  ) +
  theme_minimal() +
  coord_flip()

Code
Jordy_Smith <- ct_2loc |>
  filter(athlete_name == "Jordy Smith", tourId == 1)


odds_by_loc <- Jordy_Smith |>
  distinct(locationName, odds) |>
  group_by(locationName) |>
  slice(1) |>
  ungroup()

Jordy_Smith_lab <- Jordy_Smith |>
  left_join(odds_by_loc, by = "locationName", suffix = c("", "_loc")) |>
  mutate(locationLabel = ifelse(is.na(odds_loc),
                                locationName,
                                paste0(locationName, " (", odds_loc, ")")))

Jordy_Smith_lab |>
  ggplot(aes(x = reorder(locationLabel, score_std, FUN = median),
             y = score_std, fill = breakType)) +
  geom_boxplot() +
  labs(
    title = "Distribution of Standardized Scores by \n Location and Break Type",
    subtitle = "Jordy Smith — odds shown in parentheses",
    x = "Location (2025 odds)",
    y = "score_std",
    fill = "Break Type"
  ) +
  theme_minimal() +
  coord_flip()

Code
Joao_Chianca <- ct_2loc |>
  filter(athlete_name == "Joao Chianca", tourId == 1)


odds_by_loc <- Joao_Chianca |>
  distinct(locationName, odds) |>
  group_by(locationName) |>
  slice(1) |>
  ungroup()

Joao_Chianca_lab <- Joao_Chianca |>
  left_join(odds_by_loc, by = "locationName", suffix = c("", "_loc")) |>
  mutate(locationLabel = ifelse(is.na(odds_loc),
                                locationName,
                                paste0(locationName, " (", odds_loc, ")")))

Joao_Chianca_lab |>
  ggplot(aes(x = reorder(locationLabel, score_std, FUN = median),
             y = score_std, fill = breakType)) +
  geom_boxplot() +
  labs(
    title = "Distribution of Standardized Scores by \n Location and Break Type",
    subtitle = "Joao Chianca — odds shown in parentheses",
    x = "Location (2025 odds)",
    y = "score_std",
    fill = "Break Type"
  ) +
  theme_minimal() +
  coord_flip()

Code
Leonardo_Fioravanti <- ct_2loc |>
  filter(athlete_name == "Leonardo Fioravanti", tourId == 1)


odds_by_loc <- Leonardo_Fioravanti |>
  distinct(locationName, odds) |>
  group_by(locationName) |>
  slice(1) |>
  ungroup()

Leonardo_Fioravanti_lab <- Leonardo_Fioravanti |>
  left_join(odds_by_loc, by = "locationName", suffix = c("", "_loc")) |>
  mutate(locationLabel = ifelse(is.na(odds_loc),
                                locationName,
                                paste0(locationName, " (", odds_loc, ")")))

Leonardo_Fioravanti_lab |>
  ggplot(aes(x = reorder(locationLabel, score_std, FUN = median),
             y = score_std, fill = breakType)) +
  geom_boxplot() +
  labs(
    title = "Distribution of Standardized Scores by \n Location and Break Type",
    subtitle = "Leonardo Fioravanti — odds shown in parentheses",
    x = "Location (2025 odds)",
    y = "score_std",
    fill = "Break Type"
  ) +
  theme_minimal() +
  coord_flip()

Here I decided to group by surfers for an individual location to compare the standardized scores ordering by the odds of the surfers from bet365 for this past competition year.

Code
top_odds_teahupoo <- ct |>
  filter(locationName == "Teahupoʻo") |>
  distinct(athlete_name, odds) |>
  group_by(athlete_name) |>
  slice(1) |>
  ungroup() |>
  arrange(odds) |>
  slice_head(n = 15)

ct_top_odds_teahupoo <- ct |>
  semi_join(top_odds_teahupoo, by = "athlete_name") |>
  filter(locationName == "Teahupoʻo")

odds_labels <- top_odds_teahupoo |>
  mutate(athlete_name = factor(athlete_name))

ggplot(ct_top_odds_teahupoo,
       aes(x = reorder(athlete_name, score_std, FUN = median),
           y = score_std, fill = athlete_name)) +
  geom_boxplot(show.legend = FALSE) +
  geom_text(data = odds_labels,
            aes(x = athlete_name,
                y = max(ct_top_odds_teahupoo$score_std) + 0.05,
                label = paste0("Odds: ", odds)),
            inherit.aes = FALSE,
            color = "red", fontface = "bold", size = 3) +
  labs(
    title = "Surfers w/ Top 15 Odds at Teahupoʻo — Standardized Scores & Odds",
    x = "Surfer",
    y = "Standardized Score"
  ) +
  theme_minimal() +
  coord_flip(clip = "off") +
  theme(
    plot.margin = margin(5, 50, 5, 5)
  )

Code
top_odds <- ct |>
  filter(locationName == "Bells Beach") |>
  distinct(athlete_name, odds) |>
  group_by(athlete_name) |>
  slice(1) |>
  ungroup() |>
  arrange(odds) |>
  slice_head(n = 15)

ct_top_odds <- ct |>
  semi_join(top_odds, by = "athlete_name") |>
  filter(locationName == "Bells Beach")

range_offset <- 0.05 * diff(range(ct_top_odds$score_std, na.rm = TRUE))

odds_labels <- ct_top_odds |>
  group_by(athlete_name) |>
  summarise(whisker_top = max(score_std, na.rm = TRUE), .groups = "drop") |>
  left_join(top_odds, by = "athlete_name") |>
  mutate(
    y_pos = whisker_top + range_offset + .1,
    athlete_name = factor(athlete_name)
  )

ggplot(ct_top_odds,
       aes(x = reorder(athlete_name, score_std, FUN = median),
           y = score_std, fill = athlete_name)) +
  geom_boxplot(show.legend = FALSE, outlier.shape = NA) +
  geom_text(data = odds_labels,
            aes(x = athlete_name, y = y_pos, label = paste0("Odds: ", odds)),
            inherit.aes = FALSE,
            color = "red", fontface = "bold", size = 3) +
  labs(
    title = "Surfers w/ Top 15 Odds at Bells Beach — Standardized Scores & Odds",
    x = "Surfer",
    y = "Standardized Score"
  ) +
  theme_minimal() +
  coord_flip(clip = "off") +
  theme(plot.margin = margin(5, 60, 5, 5),
        plot.title = element_text(hjust = 0.5)) 

Code
top_odds <- ct |>
  filter(locationName == "Margaret River") |>
  distinct(athlete_name, odds) |>
  group_by(athlete_name) |>
  slice(1) |>
  ungroup() |>
  arrange(odds) |>
  slice_head(n = 15)

ct_top_odds <- ct |>
  semi_join(top_odds, by = "athlete_name") |>
  filter(locationName == "Margaret River")

range_offset <- 0.05 * diff(range(ct_top_odds$score_std, na.rm = TRUE))

odds_labels <- ct_top_odds |>
  group_by(athlete_name) |>
  summarise(whisker_top = max(score_std, na.rm = TRUE), .groups = "drop") |>
  left_join(top_odds, by = "athlete_name") |>
  mutate(
    y_pos = whisker_top + range_offset + .1,
    athlete_name = factor(athlete_name)
  )

ggplot(ct_top_odds,
       aes(x = reorder(athlete_name, score_std, FUN = median),
           y = score_std, fill = athlete_name)) +
  geom_boxplot(show.legend = FALSE, outlier.shape = NA) +
  geom_text(data = odds_labels,
            aes(x = athlete_name, y = y_pos, label = paste0("Odds: ", odds)),
            inherit.aes = FALSE,
            color = "red", fontface = "bold", size = 3) +
  labs(
    title = "Surfers w/ Top 15 Odds at Margaret River — Standardized Scores & Odds",
    x = "Surfer",
    y = "Standardized Score"
  ) +
  theme_minimal() +
  coord_flip(clip = "off") +
  theme(plot.margin = margin(5, 60, 5, 5),
        plot.title = element_text(hjust = 0.5))

My take: It seems like surfers with better odds have competed well at these locations in the past. Something to note is that there are some surfers like George Pittar who are really high compared to other surfers standardized scores but still having much worse odds than those around his score, but this is because this surfer does not usually perform well so this is actually a really good result because instead of having +6000 odds to win like he would usually be expected he instead has +3000 odds to win because of how he has done in the past. His past results are not good enough to put him up to around +600 odds like the surfers near him on the chart but this is due to him not being a very well ranked surfer on the tour so this is why he has a +3000 instead of +6000.

Here’s his other odds for comparison.

Code
ct |>
  filter(athlete_name == "George Pittar") |>
  group_by(locationName) |>
  slice(1) |>
  ungroup() |>
  select(athlete_name, locationName, odds) |>
  filter(!is.na(odds))
# A tibble: 5 × 3
  athlete_name  locationName     odds
  <chr>         <chr>           <dbl>
1 George Pittar Bells Beach      5000
2 George Pittar Gold Coast       6600
3 George Pittar Margaret River   3300
4 George Pittar Pipeline        15000
5 George Pittar Surf Abu Dhabi 799900

Final Visualizations/Analysis

Creating new variable in order to look into home nations for surfers to add that to the visualization:

Code
ct <- ct |>
  mutate(locationNation = case_when(
    locationName %in% c("Gold Coast", "Bells Beach", "Margaret River",
                        "Merewether Beach", "Narrabeen", "Rottnest Island") ~ "AUS",
    locationName %in% c("Uluwatu", "Keramas", "G-Land") ~ "INA",
    locationName == "Saquarema" | locationName == "Itauna" ~ "BRA",
    locationName == "Jeffreys Bay" ~ "RSA",
    locationName == "Teahupoʻo" ~ "PYF",
    locationName == "Lemoore" | locationName == "Lower Trestles" ~ "USA",
    locationName %in% c("Capbreton / Hossegor / Seignosse", "Supertubos",
                        "Peniche Centre Region") ~ "POR",
    locationName == "Pipeline" | locationName == "Sunset Beach" ~ "HAW",
    locationName == "Barra de la Cruz" ~ "MEX",
    locationName == "Punta Roca" ~ "SLV",
    locationName == "Cloudbreak" ~ "FJI",
    locationName == "Surf Abu Dhabi" ~ "UAE",
    TRUE ~ NA_character_
  ))


ct <- ct |>
  mutate(
    home_break = nationAbbr == locationNation
  )

ct <- ct |>
  mutate(nationAbbr = if_else(athlete_name == "Kauli Vaast", "PYF", nationAbbr))

location_counts <- ct |>
  distinct(athlete_name, locationName, eventId) |>
  group_by(athlete_name, locationName) |>
  summarise(n_events = n(), .groups = "drop") |>
  filter(n_events >= 2)

ct_2_loc <- ct |>
  inner_join(location_counts, by = c("athlete_name", "locationName"))

Graphs for analyzing individual surfers

These graphs show the distribution of a surfers standardized scores by location and break type where you can also see Bet365’s odds for the current competition season and also surfers home and away nations. Only included locations that surfers have competed in at least twice.

Here is the code used to create these graphs:

Code
surfer_graph <- function(surfer_name){

  Surfer <- ct_2_loc |>
    filter(athlete_name == .env$surfer_name, tourId == 1) |>
    mutate(home_break = nationAbbr == locationNation)

  odds_by_loc <- Surfer |>
    distinct(locationName, odds) |>
    group_by(locationName) |>
    slice(1) |>
    ungroup()

  Surfer_lab <- Surfer |>
    left_join(odds_by_loc, by = "locationName", suffix = c("", "_loc")) |>
    mutate(
      locationLabel = ifelse(is.na(odds_loc), locationName,
                             paste0(locationName, " (", odds_loc, ")"))
    )

  Surfer_lab |>
    ggplot(aes(x = reorder(locationLabel, score_std, FUN = median),
               y = score_std,
               fill  = breakType,
               color = home_break)) + 
    geom_boxplot(outlier.shape = NA, size = 0.7) +
    scale_color_manual(values = c(`TRUE` = "green3", `FALSE` = "black"),
                       labels = c(`TRUE` = "Home Nation", `FALSE` = "Away Nation"),
                       name = "Home/Away") +
    labs(
      title = "Distribution of Standardized Scores by Location and Break Type",
      subtitle = paste0(surfer_name, " — odds in parentheses"),
      x = "Location (2025 odds)",
      y = "Standardized Score",
      fill = "Break Type"
    ) +
    theme_minimal() +
    theme(
    plot.title    = element_text(hjust = 0.5),
    plot.subtitle = element_markdown(hjust = 0.5)
    ) +
    coord_flip()

}

surfer_table <- function(surfer_name) {
  Surfer <- ct_2_loc |>
    filter(athlete_name == .env$surfer_name, tourId == 1) |>
    mutate(home_break = nationAbbr == locationNation)
  
  counts_tbl <- Surfer |>
      group_by(locationName) |>
      summarise(
        events = n_distinct(eventId),
        heats = n(),
        breakType = first(breakType),
        .groups = "drop"
      ) |>
      arrange(desc(events), desc(heats), locationName) |>
      select(locationName, events, heats, breakType) |>
    rename("Location Name" = locationName, 
           "# of Events Surfed" = events,
           "# of Heats Surfed" = heats,
           "Break Type" = breakType)
  counts_tbl |>
    kable()
}

Barron Mamiya

Location Name # of Events Surfed # of Heats Surfed Break Type
Pipeline 4 13 Reef Break
Margaret River 4 11 Reef Break
Bells Beach 4 6 Point Break
Punta Roca 4 6 Point Break
Sunset Beach 3 8 Reef Break
Supertubos 3 7 Beach Break
Teahupoʻo 3 7 Reef Break
Itauna 2 4 Beach Break
Jeffreys Bay 2 3 Point Break
Lemoore 2 2 Wavepool

Connor O’Leary

Location Name # of Events Surfed # of Heats Surfed Break Type
Pipeline 6 11 Reef Break
Bells Beach 5 11 Point Break
Margaret River 5 10 Reef Break
Punta Roca 4 8 Point Break
Supertubos 4 6 Beach Break
Teahupoʻo 4 5 Reef Break
Jeffreys Bay 3 7 Point Break
Sunset Beach 3 6 Reef Break
Lemoore 3 5 Wavepool
Saquarema 3 4 Beach Break
Itauna 2 4 Beach Break
Gold Coast 2 3 Point Break

Ethan Ewing

Location Name # of Events Surfed # of Heats Surfed Break Type
Margaret River 5 11 Reef Break
Pipeline 5 9 Reef Break
Bells Beach 4 15 Point Break
Lower Trestles 4 9 Point Break
Punta Roca 4 8 Point Break
Supertubos 3 11 Beach Break
Sunset Beach 3 9 Reef Break
Jeffreys Bay 2 9 Point Break
Itauna 2 6 Beach Break
Lemoore 2 6 Wavepool
Saquarema 2 6 Beach Break
Teahupoʻo 2 3 Reef Break

Filipe Toledo

Location Name # of Events Surfed # of Heats Surfed Break Type
Pipeline 7 12 Reef Break
Bells Beach 5 18 Point Break
Margaret River 5 12 Reef Break
Jeffreys Bay 4 15 Point Break
Saquarema 4 15 Beach Break
Lemoore 4 10 Wavepool
Lower Trestles 4 9 Point Break
Supertubos 4 9 Beach Break
Teahupoʻo 4 8 Reef Break
Gold Coast 3 10 Point Break
Punta Roca 3 9 Point Break
Keramas 2 7 Reef Break
Sunset Beach 2 7 Reef Break
Capbreton / Hossegor / Seignosse 2 3 Beach Break

Griffin Colapinto

Location Name # of Events Surfed # of Heats Surfed Break Type
Pipeline 7 12 Reef Break
Margaret River 6 21 Reef Break
Bells Beach 5 15 Point Break
Supertubos 5 13 Beach Break
Teahupoʻo 5 8 Reef Break
Lemoore 4 11 Wavepool
Punta Roca 4 11 Point Break
Saquarema 4 9 Beach Break
Jeffreys Bay 4 6 Point Break
Sunset Beach 3 9 Reef Break
Gold Coast 3 7 Point Break
Lower Trestles 3 5 Point Break
Itauna 2 5 Beach Break
Keramas 2 4 Reef Break
Capbreton / Hossegor / Seignosse 2 2 Beach Break

Imaikalani deVault

Location Name # of Events Surfed # of Heats Surfed Break Type
Pipeline 5 8 Reef Break
Margaret River 3 8 Reef Break
Bells Beach 3 6 Point Break
Supertubos 2 4 Beach Break
Punta Roca 2 2 Point Break
Sunset Beach 2 2 Reef Break

Italo Ferreira

Location Name # of Events Surfed # of Heats Surfed Break Type
Pipeline 7 18 Reef Break
Bells Beach 6 15 Point Break
Margaret River 6 14 Reef Break
Supertubos 5 19 Beach Break
Lower Trestles 4 12 Point Break
Jeffreys Bay 4 10 Point Break
Punta Roca 4 10 Point Break
Lemoore 4 9 Wavepool
Teahupoʻo 4 9 Reef Break
Saquarema 4 7 Beach Break
Gold Coast 3 7 Point Break
Sunset Beach 3 6 Reef Break
Keramas 2 7 Reef Break
Capbreton / Hossegor / Seignosse 2 6 Beach Break
Itauna 2 5 Beach Break

Jack Robinson

Location Name # of Events Surfed # of Heats Surfed Break Type
Margaret River 5 16 Reef Break
Pipeline 5 10 Reef Break
Bells Beach 4 11 Point Break
Lower Trestles 4 6 Point Break
Punta Roca 4 6 Point Break
Sunset Beach 3 13 Reef Break
Supertubos 3 10 Beach Break
Teahupoʻo 3 6 Reef Break
Jeffreys Bay 2 7 Point Break
Saquarema 2 3 Beach Break
Itauna 2 2 Beach Break
Lemoore 2 2 Wavepool

Jake Marshall

Location Name # of Events Surfed # of Heats Surfed Break Type
Bells Beach 4 9 Point Break
Pipeline 4 8 Reef Break
Margaret River 4 6 Reef Break
Supertubos 3 7 Beach Break
Sunset Beach 3 6 Reef Break
Punta Roca 3 4 Point Break
Saquarema 2 4 Beach Break
Teahupoʻo 2 4 Reef Break

Joao Chianca

Location Name # of Events Surfed # of Heats Surfed Break Type
Pipeline 3 8 Reef Break
Margaret River 3 7 Reef Break
Punta Roca 3 7 Point Break
Bells Beach 3 5 Point Break
Supertubos 2 7 Beach Break
Sunset Beach 2 5 Reef Break
Lower Trestles 2 4 Point Break
Itauna 2 2 Beach Break
Saquarema 2 2 Beach Break

John John Florence

Location Name # of Events Surfed # of Heats Surfed Break Type
Pipeline 6 22 Reef Break
Margaret River 5 23 Reef Break
Bells Beach 5 14 Point Break
Sunset Beach 3 6 Reef Break
Itauna 2 6 Beach Break
Saquarema 2 6 Beach Break
Teahupoʻo 2 6 Reef Break
Gold Coast 2 5 Point Break
Punta Roca 2 5 Point Break
Supertubos 2 3 Beach Break
Keramas 2 2 Reef Break

Jordy Smith

Location Name # of Events Surfed # of Heats Surfed Break Type
Pipeline 7 19 Reef Break
Margaret River 6 21 Reef Break
Bells Beach 6 12 Point Break
Supertubos 5 10 Beach Break
Teahupoʻo 5 10 Reef Break
Jeffreys Bay 4 12 Point Break
Saquarema 4 12 Beach Break
Punta Roca 4 10 Point Break
Gold Coast 3 9 Point Break
Sunset Beach 3 7 Reef Break
Lemoore 3 5 Wavepool
Capbreton / Hossegor / Seignosse 2 5 Beach Break
Itauna 2 5 Beach Break
Keramas 2 5 Reef Break

Kanoa Igarashi

Location Name # of Events Surfed # of Heats Surfed Break Type
Pipeline 7 15 Reef Break
Bells Beach 6 13 Point Break
Margaret River 6 10 Reef Break
Supertubos 5 14 Beach Break
Teahupoʻo 5 9 Reef Break
Jeffreys Bay 4 14 Point Break
Saquarema 4 9 Beach Break
Lemoore 4 8 Wavepool
Punta Roca 4 6 Point Break
Sunset Beach 3 12 Reef Break
Gold Coast 3 9 Point Break
Keramas 2 7 Reef Break
Lower Trestles 2 6 Point Break
Itauna 2 3 Beach Break
Capbreton / Hossegor / Seignosse 2 2 Beach Break

Leonardo Fioravanti

Location Name # of Events Surfed # of Heats Surfed Break Type
Pipeline 6 21 Reef Break
Margaret River 6 11 Reef Break
Bells Beach 5 6 Point Break
Supertubos 4 8 Beach Break
Punta Roca 3 5 Point Break
Sunset Beach 3 5 Reef Break
Gold Coast 3 3 Point Break
Itauna 2 4 Beach Break
Lemoore 2 4 Wavepool
Teahupoʻo 2 4 Reef Break

Liam O’Brien

Location Name # of Events Surfed # of Heats Surfed Break Type
Pipeline 3 8 Reef Break
Punta Roca 3 6 Point Break
Supertubos 3 6 Beach Break
Bells Beach 3 5 Point Break
Margaret River 3 5 Reef Break
Sunset Beach 2 4 Reef Break
Itauna 2 3 Beach Break
Teahupoʻo 2 3 Reef Break
Lemoore 2 2 Wavepool

Matthew McGillivray

Location Name # of Events Surfed # of Heats Surfed Break Type
Margaret River 5 14 Reef Break
Pipeline 5 7 Reef Break
Punta Roca 4 12 Point Break
Bells Beach 4 11 Point Break
Sunset Beach 3 8 Reef Break
Teahupoʻo 3 5 Reef Break
Supertubos 3 4 Beach Break
Jeffreys Bay 3 3 Point Break
Lemoore 2 3 Wavepool
Itauna 2 2 Beach Break

Miguel Pupo

Location Name # of Events Surfed # of Heats Surfed Break Type
Pipeline 6 15 Reef Break
Margaret River 4 9 Reef Break
Supertubos 4 4 Beach Break
Saquarema 3 6 Beach Break
Bells Beach 3 5 Point Break
Sunset Beach 3 5 Reef Break
Teahupoʻo 2 6 Reef Break
Punta Roca 2 5 Point Break
Lemoore 2 4 Wavepool
Jeffreys Bay 2 3 Point Break

Rio Waida

Location Name # of Events Surfed # of Heats Surfed Break Type
Supertubos 3 9 Beach Break
Bells Beach 3 7 Point Break
Punta Roca 3 5 Point Break
Pipeline 3 4 Reef Break
Margaret River 3 3 Reef Break
Teahupoʻo 2 5 Reef Break
Sunset Beach 2 4 Reef Break
Itauna 2 3 Beach Break

Ryan Callinan

Location Name # of Events Surfed # of Heats Surfed Break Type
Pipeline 6 12 Reef Break
Margaret River 6 11 Reef Break
Bells Beach 5 13 Point Break
Supertubos 4 5 Beach Break
Sunset Beach 3 6 Reef Break
Lemoore 3 5 Wavepool
Punta Roca 3 5 Point Break
Teahupoʻo 3 5 Reef Break
Capbreton / Hossegor / Seignosse 2 9 Beach Break
Itauna 2 5 Beach Break
Jeffreys Bay 2 5 Point Break

Seth Moniz

Location Name # of Events Surfed # of Heats Surfed Break Type
Pipeline 7 16 Reef Break
Margaret River 6 13 Reef Break
Bells Beach 5 8 Point Break
Teahupoʻo 4 8 Reef Break
Supertubos 4 7 Beach Break
Sunset Beach 3 10 Reef Break
Jeffreys Bay 3 4 Point Break
Punta Roca 3 3 Point Break
Gold Coast 2 5 Point Break
Itauna 2 3 Beach Break
Lemoore 2 3 Wavepool
Saquarema 2 3 Beach Break

Yago Dora

Location Name # of Events Surfed # of Heats Surfed Break Type
Pipeline 6 14 Reef Break
Supertubos 5 13 Beach Break
Teahupoʻo 5 11 Reef Break
Bells Beach 5 8 Point Break
Margaret River 5 8 Reef Break
Saquarema 4 10 Beach Break
Punta Roca 4 9 Point Break
Lemoore 4 8 Wavepool
Jeffreys Bay 4 7 Point Break
Gold Coast 3 6 Point Break
Itauna 2 8 Beach Break
Capbreton / Hossegor / Seignosse 2 4 Beach Break
Keramas 2 2 Reef Break
Sunset Beach 2 2 Reef Break

Graphs for analyzing individual locations

These graphs show the distribution of surfers standardized scores for a location containing the top 15 surfers by Bet365’s odds for the current competition season and also surfers home and away nations.

Some data that is incorrect in the original data set I was given; affecting the results:

A bunch of missing heat information from certain locations as seen from looking at past competitions. So I added tables below to show which surfers have multiple events to catch any inconsistencies.

Code
top_odds <- function(surf_break){

top_odds <- ct |>
  filter(locationName == surf_break) |>
  distinct(athlete_name, odds) |>
  group_by(athlete_name) |>
  slice(1) |>
  ungroup() |>
  arrange(odds) |>
  slice_head(n = 15)

ct_top_odds <- ct |>
  semi_join(top_odds, by = "athlete_name") |>
  filter(locationName == surf_break)

range_offset <- 0.05 * diff(range(ct_top_odds$score_std, na.rm = TRUE))

odds_labels <- ct_top_odds |>
  group_by(athlete_name) |>
  summarise(
    whisker_top = max(score_std, na.rm = TRUE),
    home_break  = first(home_break),
    .groups = "drop"
  ) |>
  left_join(top_odds, by = "athlete_name") |>
  mutate(
    y_pos = whisker_top + range_offset + 0.1,
    athlete_name = factor(athlete_name),
    home_lab = factor(home_break,
                      levels = c(TRUE, FALSE),
                      labels = c("Home Country", "Away Country"))
  )

ggplot(ct_top_odds,
       aes(x = reorder(athlete_name, score_std, FUN = median),
           y = score_std, fill = athlete_name)) +
  geom_boxplot(show.legend = FALSE, outlier.shape = NA) +
  geom_text(
    data = odds_labels,
    aes(x = athlete_name, y = y_pos,
        label = paste0("Odds: ", odds),
        color = home_lab),
    inherit.aes = FALSE,
    fontface = "bold", size = 3
  ) +
  scale_color_manual(values = c("Home Country" = "green", "Away Country" = "red")) +
  guides(color = "none") +
  labs(
    title = paste0("Surfers w/ Top 15 Odds at ", surf_break, " — Standardized Scores & Odds"),
    subtitle = "<span style='color:green'>&#9632;</span> Home Country &nbsp;&nbsp; <span style='color:red'>&#9632;</span> Away Country",
    x = "Surfer",
    y = "Standardized Score"
  ) +
  theme_minimal() +
  coord_flip(clip = "off") +
  theme(
    plot.title    = element_text(hjust = 0.5),
    plot.subtitle = element_markdown(hjust = 0.5),
    plot.margin   = margin(5, 60, 5, 5)
  )
}

location_table <- function(surf_break) {
  top_odds <- ct |>
    filter(locationName == surf_break) |>
    distinct(athlete_name, odds) |>
    group_by(athlete_name) |>
    slice(1) |>
    ungroup() |>
    arrange(odds) |>
    slice_head(n = 15)

  ct_top_odds <- ct |>
    semi_join(top_odds, by = "athlete_name") |>
    filter(locationName == surf_break)
  
  counts_tbl <- ct_top_odds |>
      group_by(athlete_name) |>
      summarise(
        events = n_distinct(eventId),
        heats = n(),
        .groups = "drop"
      ) |>
      arrange(desc(events), desc(heats), athlete_name) |>
      select(athlete_name, events, heats) |>
    rename("Surfer Name" = athlete_name,
           "# of Events Competed" = events,
           "# of Heats Competed" = heats)
  counts_tbl |>
    kable()
}

Bells Beach

Surfer Name # of Events Competed # of Heats Competed
Italo Ferreira 6 15
Kanoa Igarashi 6 13
Jordy Smith 6 12
Filipe Toledo 5 18
Griffin Colapinto 5 15
Connor O’Leary 5 11
Yago Dora 5 8
Leonardo Fioravanti 5 6
Ethan Ewing 4 15
Jack Robinson 4 11
Matthew McGillivray 4 11
Barron Mamiya 4 6
Rio Waida 3 7
Cole Houshmand 2 7
Crosby Colapinto 2 3

Gold Coast

Surfer Name # of Events Competed # of Heats Competed
Filipe Toledo 3 10
Jordy Smith 3 9
Kanoa Igarashi 3 9
Griffin Colapinto 3 7
Italo Ferreira 3 7
Yago Dora 3 6
Leonardo Fioravanti 3 3
Connor O’Leary 2 3
Joao Chianca 1 2
Marco Mignot 1 2
Rio Waida 1 2
Ethan Ewing 1 1
Jack Robinson 1 1
Joel Vaughan 1 1
Samuel Pupo 1 1

Itauna

Surfer Name # of Events Competed # of Heats Competed
Yago Dora 2 8
Ethan Ewing 2 6
Griffin Colapinto 2 5
Italo Ferreira 2 5
Jordy Smith 2 5
Barron Mamiya 2 4
Connor O’Leary 2 4
Leonardo Fioravanti 2 4
Kanoa Igarashi 2 3
Jack Robinson 2 2
Joao Chianca 2 2
Cole Houshmand 1 2
Crosby Colapinto 1 2
Filipe Toledo 1 1
Jake Marshall 1 1

Lower Trestles

Surfer Name # of Events Competed # of Heats Competed
Italo Ferreira 4 12
Ethan Ewing 4 9
Filipe Toledo 4 9
Jack Robinson 4 6
Griffin Colapinto 3 5
Kanoa Igarashi 2 6
Joao Chianca 2 4
Yago Dora 1 4
Barron Mamiya 1 2
Cole Houshmand 1 2
Crosby Colapinto 1 2
Leonardo Fioravanti 1 2
Jake Marshall 1 1
Jordy Smith 1 1
Kelly Slater 1 1

Margaret River

Surfer Name # of Events Competed # of Heats Competed
Griffin Colapinto 6 21
Jordy Smith 6 21
Italo Ferreira 6 14
Seth Moniz 6 13
Leonardo Fioravanti 6 11
Kanoa Igarashi 6 10
Jack Robinson 5 16
Matthew McGillivray 5 14
Filipe Toledo 5 12
Ethan Ewing 5 11
Yago Dora 5 8
Barron Mamiya 4 11
Jake Marshall 4 6
Joao Chianca 3 7
George Pittar 2 5

Pipeline

Surfer Name # of Events Competed # of Heats Competed
Jordy Smith 7 19
Italo Ferreira 7 18
Kanoa Igarashi 7 15
Filipe Toledo 7 12
Griffin Colapinto 7 12
Leonardo Fioravanti 6 21
Yago Dora 6 14
Gabriel Medina 5 18
Jack Robinson 5 10
Ethan Ewing 5 9
Samuel Pupo 4 6
Joao Chianca 3 8
Rio Waida 3 4
Crosby Colapinto 1 2
Marco Mignot 1 1

Punta Roca

Surfer Name # of Events Competed # of Heats Competed
Matthew McGillivray 4 12
Griffin Colapinto 4 11
Italo Ferreira 4 10
Jordy Smith 4 10
Connor O’Leary 4 8
Ethan Ewing 4 8
Barron Mamiya 4 6
Jack Robinson 4 6
Kanoa Igarashi 4 6
Filipe Toledo 3 9
Joao Chianca 3 7
Leonardo Fioravanti 3 5
Jake Marshall 3 4
Crosby Colapinto 2 7
Cole Houshmand 2 6

Saquarema

Surfer Name # of Events Competed # of Heats Competed
Filipe Toledo 4 15
Jordy Smith 4 12
Yago Dora 4 10
Griffin Colapinto 4 9
Kanoa Igarashi 4 9
Italo Ferreira 4 7
Connor O’Leary 3 4
Ethan Ewing 2 6
Jack Robinson 2 3
Joao Chianca 2 2
Cole Houshmand 1 4
Barron Mamiya 1 1
Crosby Colapinto 1 1
Leonardo Fioravanti 1 1
Marco Mignot 1 1

Teahupoʻo

Surfer Name # of Events Competed # of Heats Competed
Yago Dora 5 11
Jordy Smith 5 10
Griffin Colapinto 5 8
Italo Ferreira 4 9
Seth Moniz 4 8
Connor O’Leary 4 5
Kauli Vaast 3 9
Barron Mamiya 3 7
Jack Robinson 3 6
Miguel Pupo 2 6
Leonardo Fioravanti 2 4
Mihimana Braye 2 4
Ethan Ewing 2 3
Cole Houshmand 1 2
Joao Chianca 1 1